Instance Classification using Co-Occurrences on the Web

نویسندگان

  • Gijs Geleijnse
  • Jan Korst
  • Viktor de Boer
چکیده

We present a novel unsupervised approach to mapping artrelated instances (such as music artists and painters) to subjective categories like genre and style. We base our approach on co-occurrences of the two on the web, found with Google. The co-occurrences are found using three methods: by identifying the search engine counts, by analyzing Google excerpts found by querying patterns and by scanning full documents. Per instance, we use the same co-occurrence-based approach to find its nearest neighbors, i.e. the most related instances. These results can be combined in order to create a more reliable classification. We tested and compared the three methods on two different domains: mapping music artists to genres, and painters to art-styles. The results show that the use of related instances indeed improves the precision of the classification. Moreover, the methods with the lowest Google Complexity perform best.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Document Embedding Method for News Classification

Abstract- Text classification is one of the main tasks of natural language processing (NLP). In this task, documents are classified into pre-defined categories. There is lots of news spreading on the web. A text classifier can categorize news automatically and this facilitates and accelerates access to the news. The first step in text classification is to represent documents in a suitable way t...

متن کامل

The Intellectual Structure of Knowledge in the Field of Distance Education Using the Co-Word analyses

Background: Co- word analysis is one of the content analysis methods used in scientometric studies and mapping the scientific structure of various fields. The purpose of the present research is to map the structure of distance education using the co-word analysis. Methods: The research method is content analysis using co- word analysis. The research population are 31607 documents indexed in the...

متن کامل

Using jWebMiner 2.0 to Improve Music Classification Performance by Combining Different Types of Features Mined from the Web

This paper presents the jWebMiner 2.0 cultural feature extraction software and describes the results of several musical genre classification experiments performed with it. jWebMiner 2.0 is an easy-to-use and open-source tool that allows users to mine the Internet in order to extract features based on both Last.fm social tags and general web search string co-occurrences extracted using the Yahoo...

متن کامل

Image Steganalysis Based on Co-Occurrences of Integer Wavelet Coefficients

We present a steganalysis scheme for LSB matching steganography based on feature vectors extracted from integer wavelet transform (IWT). In integer wavelet decomposition of an image, the coefficients will be integer, so we can calculate co-occurrence matrix of them without rounding the coefficients. Before calculation of co-occurrence matrices, we clip some of the most significant bitplanes of ...

متن کامل

Analyzing Relatedness by Toponym Co-Occurrences on Web Pages

This research proposes a method for capturing “relatedness between geographical entities” based on the co-occurrences of their names on web pages. The basic assumption is that a higher count of co-occurrences of two geographical places implies a stronger relatedness between them. The spatial structure of China at the provincial level is explored from the co-occurrences of two provincial units i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006